Adaptive Video Transcoding

ABSTRACT

Embodiments of the invention describe a method for transcoding an input video in a first encoded format to an output video in a second encoded format, wherein the videos include a set of segments and each segment includes frames. First, the method is determining a set of downsample resilient segments in the input video and a set of full-resolution segments in the input video. Next, the method is downsampling the set of downsample resilient segments to produce a set of downsampled segments and transcoding the input video using the set of full-resolution segments and the set of downsampled segments to produce the output video including at least two segments with different resolutions.

FIELD OF THE INVENTION

The invention relates generally to video processing, and moreparticularly to adaptive video transcoding.

BACKGROUND OF THE INVENTION

Transcoding is the digital-to-digital conversion of one encoded video toanother encoded video. Video transcoding methods convert a digitalvideo, i.e., a bitstream, from a first encoded format to a secondencoded format. The second format can provide additional benefits, suchas reduced storage and transmission requirements. For example, a videorecorder can use the video transcoding to convert a video in the MPEG-2format to the H.264/AVC format, to take advantage of the improvedcompression efficiency of the H.264/AVC format.

Typically, a transcoder includes a decoder connected to an encoder. Forexample, an MPEG-2 decoder connected to a H.264/AVC encoder forms areference transcoder. The reference transcoder is computationallycomplex due to the need to perform motion estimation in the H.264/AVCencoder. The complexity of the reference transcoder can be reduced byreusing motion and mode information from the input MPEG-2 videobitstream. However, the reuse of such information in the mostcost-effective and useful manner is a known problem.

To reduce the complexity of a reference MPEG-2-to-H.264/AVC transcoder,methods such as mapping motion vectors or reducing the resolution, i.e.,downsampling, during transcoding have been described.

In a conventional video transcoder, video data are typicallytransformed, in part, by a quantizer. A fine quantizer producehigh-quality compressed video with a large bit-rate or storagerequirement. A coarse quantizer produce low-quality compressed videowith reduced storage requirements.

The encoder or the transcoder performance can be improved for a givenbit-rate by reducing a resolution of a frame of a video beforetranscoding operations, followed by increasing the resolution afterdecoding that encoded video. Because the resolution of the video hasbeen reduced, a finer quantizer can be used for a given bit-rate.

However, the trade-off between resolution and quantizer noise sometimesleads to a reduction in video quality. Fine details in the video can beblurred by downsampling to such an extent that after being decoded andupsampled, visible artifacts appear in the video, even when a very finequantizer has been used.

Conventional transcoding methods either reduce resolution of a videobefore the transcoding operation, which decreases the quality ofsubsequently decoded video, or encode full resolution video, whichincreases the complexity of the transcoding operations.

It is desired to reduce the complexity of the transcoding videooperation without decreasing the quality of a subsequently decodedvideo.

SUMMARY OF THE INVENTION

It is an object of the invention to provide a method for reducing acomplexity of a video transcoding without decreasing a quality of asubsequently decoded video.

It is a further object of the invention to provide a method that enablesswitching adaptively between full and reduced-resolution transcoding,based on the content of the video.

The embodiments of the invention are based on a realization thatdifferent segments of the video have different sensitivity to thedownsampling operation than other segments of the same video. Thus, bydownsampling, before the transcoding, only resilient to downsamplingsegments of the video, the complexity of the video transcoding overallis reduced without decreasing the quality of subsequently decoded andupsampled video. Moreover, the resilient to downsampling segments of thevideo are selected based on content of the video itself, enablingadaptive switching between full and reduced-resolution transcoding basedon the content of the video.

One embodiment of the invention describes a method for transcoding aninput video in a first encoded format to an output video in a secondencoded format, wherein the videos include a set of segments and eachsegment includes frames, comprising a processor for performing steps ofthe method, comprising the steps of: determining a set of downsampleresilient segments in the input video; determining a set offull-resolution segments in the input video; downsampling the set ofdownsample resilient segments to produce a set of downsampled segments;and transcoding the input video using the set of full-resolutionsegments and the set of downsampled segments to produce the output videoincluding at least two segments with different resolutions.

Another embodiment describes an adaptive video transcoder, comprising:an adaptive resolution selector configured to determine a set ofdownsample resilient segments and a set of full-resolution segments inan input video; a downsampling module configured to downsample the setof downsample resilient segments to produce a set of downsampledsegments; and a transcoding module configured to transcode the inputvideo using the set of downsampled segments and the set offull-resolution segments to produce a output video having at least twosegments of different resolution.

Yet another embodiment describes a method for adaptive video transcodingof an input video in a first encoded format into an output video in asecond encoded format, wherein each segment of the input video has aconstant resolution, comprising a processor for performing steps of themethod, comprising the steps of: determining a set of downsampleresilient segments in the input video; and transcoding the input videointo the output video, such that a resolution of only the set ofdownsample resilient segments in the output video is reduced.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a method and a system for adaptivelytranscoding a video based on a content of the video according toembodiments of an invention;

FIG. 2 is a block diagram a method for adaptively transcoding a videobased on quality metrics of a full-resolution video according anembodiment of the invention;

FIG. 3 is a block diagram a method for adaptively transcoding a videobased on bitstream information according an embodiment of the invention;

FIG. 4 is a block diagram of an adaptive-resolution transcoder accordingto an embodiment of the invention;

FIG. 5 is a block diagram of an adaptive-resolution transcoder based onquality metrics according to an embodiment of the invention; and

FIG. 6 is a block diagram of an adaptive-resolution transcoder based oncompressed data according to an embodiment of the invention.

DESCRIPTION OF THE INVENTION

FIG. 1 shows a method and a system 100 for adaptively transcoding 100 aninput video 110 to produce an output video 131 according to embodimentsof our invention. The transcoding is based on a content of the video.The video includes frames 120. The video 110 is partitioned into a setof segments 115, e.g., a segment 117. The segment 117 can include one ormore frames 120.

The content 140 of the segment 117 of the video is analyzed 150 andcompared to a predetermined threshold 170 to determine if that segmentis downsample resilient 155.

As defined herein, for the purpose of this specification and appendedclaims, a downsample resilient segment of a video is a segment, whichafter being downsampled and transcoded can be decoded and upsampled to adecoded segment, such that a resolution and a quality of the decodedsegment are substantially equal to a resolution and a quality of thedownsample resilient segment before downsampling and transcoding.

If the segment 117 is the downsample resilient segment, a downsampledversion 160 of the segment 117 is sent to an encoder 130. Otherwise, afull resolution version 165 of the segment 117 is sent to the encoder130. The method 100 is repeated for all segments 117 of the video.

We transcode the input video using a set of full-resolution segments anda set of downsampled segments to produce an output video in a secondencoded format, wherein the output video includes at least two segmentswith different resolutions.

We analyze the content of the video, on a segment by segment basis, todetermine if a particular segment is downsample resilient. Oneembodiment analyzes 150 the segment 117, based on a full-resolutionvideo 144. An alternative embodiment analyzes a bitstream information146 retrieved from the encoded video.

FIG. 2 shows a method 200 for determining the downsample resilientsegments 270 based on metrics of the quality of a full-resolution videodecoded from the input video 110. The full-resolution segment 165 of thevideo is first downsampled 220 and than upsampled 230 to produce areference signal 235, such that a resolution of the reference signal 235is equal to the resolution of the segment 165. We measure 240 adifference between the reference signal 235 and the full resolutionsegment 165, and the result of the measurement 245 is compared 260 witha predetermined threshold 250 to identify the segment as a downsampleresilient segment 270.

The thresholds 250 can include one threshold, or separate thresholds forhorizontal and vertical downsampling, respectively. Furthermore, we candetermine optimal downsampling parameters by varying a horizontal scalefactor and a vertical scale factor for the downsampling 220.

The measure of difference can be a mean-squared error (MSE) between thereference signal 235 and the input video 110, or a mean-absolute errorfor the measuring.

FIG. 3 shows a method 300 for determining the downsample resilientsegments based on bitstream information 340 retrieved from the set ofsegments 115 of an encoded video 110, e.g., a segment 310. The examplesof bitstream information 340 are, but not limited to, motion vectors 320and discrete cosine transform (DCT) coefficients 330.

By analyzing the DCT coefficients extracted from the encoded video, wecan determine if the segment 310 is downsample resilient. If most of thehigh-frequency components from the input bitstream are zero, then thereare typically a small number of fine details or sharp edges in thesegment, and the segment is more likely to be downsample resilient.

Accordingly, by comparing 360 the bitstream information 340, such asmotion vectors 320 or DCT coefficients 330 with thresholds 350, wedetermine if the segment 310 is the downsample resilient segment.Moreover, by using a variety of thresholds 350, e.g., for vertical andhorizontal downsampling of different magnitudes, we can determinescaling factors 370 for the subsequent downsampling. For example, if themagnitude of both the vertical motion vectors and the horizontal motionvectors are less then the predetermined vertical and horizontalthresholds, then the both vertical and horizontal scaling factors are 1,i.e., the segment 310 is not downsample resilient.

If the magnitude of vertical motion vector is greater than the thresholdfor the vertical scale factor of 2, but less than threshold for thevertical scale factor of 3, then the vertical scaling factor is 2.Similarly, the horizontal scaling factor is determined by comparing themagnitude of the horizontal motion vector with number of the horizontalthresholds. Typically, the scaling factors have magnitudes of powers oftwo, e.g., 1, 2, 4, 8.

The horizontal scaling factor does not have to be equal to the verticalscaling factor. Furthermore, in one embodiment the horizontal thresholdis part of a set of horizontal thresholds, and the vertical threshold ispart of a set of vertical thresholds, and each horizontal threshold andeach vertical thresholds corresponds to a particular horizontal andvertical scaling factor respectfully.

EXAMPLES

FIG. 4 shows a transcoder according to one embodiment of the invention.The input video bitstream 110 is processed by a video decoder 420 toproduce a full-resolution video 425, and macroblock informationincluding motion vectors 415, and coding modes 417.

An adaptive resolution selector 430 determines the pair of resolutionscale factors (sx, sy) 435 for both horizontal and vertical directionsaccording to outputs of the video decoder 420. The adaptive resolutionselector 430 determines whether the system transcodes thefull-resolution video 425 or a reduced resolution video 445, and whatthe scale factors are in each dimension for downsampling 440. Forinstance, resolution scale factors of (1, 1) implies full-resolutiontranscoding, while resolution scale factors of (2, 1) implies horizontaldown-sampling by a factor of two and no down-sampling in the verticaldirection. The scale factors can have other values, e.g., 3, 4, 3.5. Theresolution of the video 445 can change adaptively over time.

The spatial resolution is signaled at certain points in the bitstream.For instance, in the H.264/AVC coding format, the spatial resolution offrames in a coded video sequences is allowed to change at aninstantaneous decoding refresh (IDR) picture. A new spatial resolutionof frames in a coded video sequence is signaled by the sequenceparameter sets (SPS) syntax, as part of an IDR access unit. Similarly,in the MPEG-2 coding format, a change in spatial resolution can besignaled in a sequence header.

When the transcoder adapts the spatial resolution of the current frameand subsequent frames, the system can either wait until the next IDRaccess unit in the case of H.264/AVC, or the sequence header, in thecase of MPEG-2, or transcode the frame in such a way that the changetakes effect immediately. A decision for a group of frames or pictures(GOP) also can be made based on the collective set of resolutionselections for several frames, including both previous and subsequentframes.

If the reduced resolution is selected, then the full-resolution video425 is down-sampled 440 by the resolution scaling factors 435. Motionvector mapping is performed according to the resolution scale factorsusing outputs of the video decoder to yield mapped motion vectors 415.Quantizer and mode selection are also performed according to theresolution scale factors using outputs of the video decoder to yieldoutput quantizers and output coding modes 417.

The video encoder encodes 450 either the full-resolution or reducedresolution video according to the mapped motion vectors, outputquantizers, and output coding modes to produce a transcoded outputbitstream 460.

Adaptive Resolution Selection Based on Segment Quality

FIG. 5 shows an adaptive-resolution transcoder based on frame qualitymetrics according to an embodiment of the invention. Each segment of thevideo bitstream 110, which can be represented as a frame or field, isdecoded 520 to a full-resolution video 525 of the segment anddownsampled 540 horizontally and/or vertically by the resolution scalingfactors 535. The resulting lower-resolution frame 545 is then upsampled550 and filtered, resulting in a down/up-sampled segment 555 whoseresolution matches the originally decoded video 525. The difference 547between this down/up-sampled frame and the originally decoded frame istaken and then passed to an adaptive resolution selector.

The adaptive resolution selector applies a measure 537 to the difference547 between the down/up-sampled segment and the originally decodedsegment. This measure is compared to a threshold, or a set of thresholds539. For example, the measure is the MSE. If down/up-sampling the framedoes not significantly degrade the image quality, then the MSE is small.Transcoding to a reduced resolution should not significantly degrade theoverall frame quality, so the adaptive resolution selector switches tothe reduced-resolution mode because the MSE is less than a giventhreshold. However, if the MSE is greater than the threshold, then thetranscoder switches to the full-resolution mode to avoid a significantdecrease in frame quality. Other measures based on the differencebetween the originally decoded frame and the down-up/sampled frame alsocan be used, e.g., sum of absolute differences (SAD).

After the resolution has been selected, the full or reduced-resolutionvideo frame is passed to the reduced-complexity encoder 450, which usesparameters 415 and 417, mapped from the input bitstream, to produce atranscoded output bitstream 460. The parameters can include motionvectors, macroblock modes, and quantizer information.

Adaptive Resolution Selection Based on Compressed Data

FIG. 6 shows an adaptive-resolution transcoder based on an encoded video110. In this embodiment, the input to the adaptive resolution selectoris data extracted directly from the input video bitstream. This methodeliminates the need for up-sampling and differencing, as shown in FIG.5.

One example of extracted bitstream information that can be used todecide whether to switch to a lower resolution is the magnitude ofhorizontal and/or vertical motion vectors between frames. If the averagemagnitude 635 of horizontal motion vectors between two frames is largecompared to thresholds 637, then it is likely that the amount of motionbetween those two frames is large. Because motion typically cause blurwhen a frame is acquired with a camera, it is likely that pairs offrames with large horizontal motion vector magnitudes degrade less froma down/up-sampling process than pairs of frames with little or nomotion. The adaptive resolution switcher can therefore switch to areduced horizontal resolution mode when the average horizontal motionvector magnitude is above some given threshold. A similar method can beapplied to vertical motion vectors.

Another example of an input to the adaptive resolution switcher is theDCT coefficients extracted from the input bitstream. If most of thehigh-frequency components from the input bitstream are zero, then thereare a small number fine details or sharp edges in the correspondingvideo frame. Therefore, the frame can be transcoded using the lowerresolution. If there is a significant amount of high-frequencycoefficient activity, then the resolution remains the same. Thehorizontal and vertical resolution scale factors can be different.

Timing of Resolution Change

In some embodiments, the transcoding is performed according to a mode ofthe transcoding, e.g., instantaneous, predictive, and delayed modes.

In the instantaneous mode, the adaptive resolution selector analyses thecharacteristics of the current input frame. If a decision is made tochange the resolution, then the frame is immediately transcoded to aninstantaneous decoding refresh (IDR) picture, i.e., the downsampledsegments are immediately transcoded after the downsampling. However,transcoding too many frames to IDR pictures can reduce codingefficiency.

The instantaneous mode can limit the frequency of changes of theresolution. This mode can restrict the resolution changes only toboundaries of GOP. Because all predicted frames and their correspondingreference frames have the same resolution, resolution changes also canbe limited, for example, to I or P input frames to reduce complexity andmaintain coding efficiency.

In the predictive mode, the adaptive resolution selector measurescharacteristics from a series of frames or GOP and uses thecharacteristics to decide whether to initiate a resolution change on thenext GOP. In one embodiment, we measure a characteristic of a currentsegment in the set of segments and select a next segment into the set ofdownsample resilient segments based on the characteristic.

Because this decision is made before a GOP is transcoded, the resolutionchange and transcoding operations can be performed concurrently, thusreducing the complexity and cost.

In the delayed mode, each segment includes frames for a group ofpictures (GOP), and characteristics of the frames in the current GOP arebuffered and measured. Then, a decision is made whether to change theresolution of the current GOP, or to initiate a change within the GOPusing the characteristics of the frames. Although both embodiments canbe used in this mode, the second embodiment is more suitable because theactivity measure in the adaptive resolution selector does not requireframe buffers.

Although the invention has been described with reference to certainpreferred embodiments, it is to be understood that various otheradaptations and modifications can be made within the spirit and scope ofthe invention. Therefore, it is the object of the append claims to coverall such variations and modifications as come within the true spirit andscope of the invention.

1. A method for transcoding an input video in a first encoded format toan output video in a second encoded format, wherein the videos include aset of segments and each segment includes at least one frame, comprisinga processor for performing steps of the method, comprising the steps of:determining a set of downsample resilient segments in the input video;downsampling, adaptively, the set of downsample resilient segments toproduce a set of downsampled segments; and transcoding the input videousing the set of downsampled segments to produce the output videoincluding at least two segments with different resolutions.
 2. Themethod of claim 1, wherein the determining further comprising: modifyinga segment of the input video by downsampling and upsampling operationsto produce a reference signal; measuring a difference between thesegment and the reference signal; and selecting the segment as thedownsample resilient segment based on the difference and a threshold. 3.The method of claim 1, wherein the determining further comprising:selecting a segment as the downsample resilient segment based on aresult of comparison of a motion vector of the segment with apredetermined threshold.
 4. The method of claim 1, further comprising:comparing discrete cosine transform coefficients extracted from asegment with a threshold; and selecting the segment as the downsampleresilient segment based on the comparing.
 5. The method of claim 1,further comprising: associating each segment in the set of downsampleresilient segments with a vertical scaling factor and with a horizontalscaling factor such that the downsampling is performed according tovalues of the scaling factors.
 6. The method of claim 5, wherein thevertical scaling factor equals 1, and the horizontal scaling factor isgreater than
 1. 7. The method of claim 5, wherein the horizontal scalingfactor equals 1, and the vertical scaling factor is greater than
 1. 8.The method of claim 5, wherein the horizontal scaling factor equals thevertical scaling factor.
 9. The method of claim 5, wherein thehorizontal scaling factor differs from the vertical scaling factor. 10.The method of claim 1, wherein each segment in the set of segments has aconstant resolution.
 11. The method of claim 1, further comprising:determining a set of full-resolution segments in the input video,wherein the transcoding is further using the set of full-resolutionsegments.
 12. The method of claim 1, wherein the transcoding isperformed according to a mode of the transcoding.
 13. The method ofclaim 12, wherein the mode of the transcoding is instantaneous, suchthat the downsampled segments are immediately transcoded after thedownsampling based on characteristics of the current input frame. 14.The method of claim 12, wherein the mode of the transcoding ispredictive, wherein the determining further comprising: measuring acharacteristic of a current segment in the set of segments; andselecting a next segment into the set of downsample resilient segmentsbased on the characteristic.
 15. The method of claim 12, wherein eachsegment includes frames for a group of pictures (GOP), and the mode ofthe transcoding is delayed, and the determining using characteristics ofthe frames.
 16. An adaptive video transcoder, comprising: an adaptiveresolution selector configured to determine a set of downsampleresilient segments in an input video; a downsampling module configuredto adaptively downsample the set of downsample resilient segments toproduce a set of downsampled segments; and a transcoding moduleconfigured to transcode the input video using the set of adaptivelydownsampled segments to produce a output video having at least twosegments of different resolution.
 17. The adaptive transcoder of claim16, wherein the adaptive resolution selector is further configured todetermine a vertical scaling factor and a horizontal scaling factor foreach segment in the set of downsample resilient segments, and whereinthe downsampling module is further configured to downsample according tothe scaling factors.
 18. A method for adaptive video transcoding of aninput video in a first encoded format into an output video in a secondencoded format, wherein each segment of the input video has a constantresolution, comprising a processor for performing steps of the method,comprising the steps of: determining a set of downsample resilientsegments in the input video; and transcoding the input video into theoutput video, such that a resolution of only the set of downsampleresilient segments in the output video is reduced.
 19. The method ofclaim 18, the transcoding further comprising: modifying a segment of theinput video by downsampling and upsampling to produce a referencesignal; comparing the segment of the input video with the referencesignal to determine scaling factors; and downsampling the segment of theinput video according to the scaling factors.
 20. The method of claim18, wherein the input video includes bitstream information of a segmentof the video, further comprising: determining scaling factors based onthe bitstream information of the segment; and downsampling the segmentaccording to the scaling factors.
 21. The method of claim 20, whereinthe bitstream information includes a horizontal motion vector and avertical motion vector, and the scaling factors include a horizontalscale factor and a vertical scale factor, further comprising: comparinga magnitude of the horizontal motion vector with a horizontal thresholdto determine the horizontal scale factor; and comparing a magnitude ofthe vertical motion vector with a vertical threshold to determine thevertical scale factor.
 22. The method of claim 21, wherein thehorizontal threshold is part of a set of horizontal thresholds, and thevertical threshold is part of a set of vertical thresholds, and whereineach horizontal threshold and each vertical thresholds corresponds to aparticular horizontal and vertical scaling factor respectfully.
 23. Themethod of claim 20, wherein the bitstream information includes discretecosine transform (DCT) coefficients, and the determining is based on theDCT coefficients.
 24. The method of claim 18, wherein the second encodedformat is H.264/AVC.